AITopics | meta-gradient reinforcement learning

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Neural Information Processing SystemsDec-24-2025, 10:52:56 GMT

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median score of a strong actor-critic baseline.

meta-gradient reinforcement learning, name change, objective discovered online, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Meta-Gradient Reinforcement Learning

Neural Information Processing SystemsNov-20-2025, 21:56:37 GMT

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves. It is well-known that these decisions are crucial to the overall success of RL algorithms. We discuss a gradient-based meta-learning algorithm that is able to adapt the nature of the return, online, whilst interacting and learning from the environment. When applied to 57 games on the Atari 2600 environment over 200 million frames, our algorithm achieved a new state-of-the-art performance.

algorithm, artificial intelligence, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Neural Information Processing SystemsMay-27-2025, 09:07:47 GMT

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time.

artificial intelligence, machine learning, meta-gradient reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Neural Information Processing SystemsJan-27-2025, 14:31:58 GMT

Strengths: The idea of formulating the inner loss for meta RL as learning from the objective discovered by its own is interesting and novel. Generally, defining the algorithm to self-discover its objective makes the learning algorithm moves one step closer towards developing automated machine intelligence compared to the conventional meta RL methods which greatly rely on expert's design choice such as the hyperparameter to perform learning-to-learn. The authors present extensive experiment results to evaluate the proposed method. The proposed method has been evaluated on three task domains: a catch game to demonstrate the method could effectively learn bootstrapping, a 5-state random walk to demonstrate the method works in non-stationary environments, and ALE which is a large-scale RL testbed. In all the task domains, the proposed method achieves noticeable performance improvement over the compared baselines.

meta-gradient reinforcement learning, neurips paper, objective discovered online, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Neural Information Processing SystemsJan-27-2025, 14:31:51 GMT

The reviewers agreed that this is an interesting, novel, and well-executed contribution. I would like to bring up two issues that were raised in the discussion, and ask the authors to address them in their final version. This should at least be mentioned/discussed.

meta-gradient reinforcement learning, neurips paper, objective discovered online

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Neural Information Processing SystemsOct-11-2024, 02:56:33 GMT

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time.

deep neural network, meta-gradient reinforcement learning, objective discovered online, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Meta-Gradient Reinforcement Learning

Neural Information Processing SystemsOct-8-2024, 15:10:23 GMT

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves.

algorithm, meta-gradient reinforcement learning, value function, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Meta-Gradient Reinforcement Learning

Neural Information Processing SystemsOct-7-2024, 07:01:07 GMT

I would suggest that the authors restate it.

gamma, meta-gradient reinforcement learning, review, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Meta-Gradient Reinforcement Learning

Xu, Zhongwen, Hasselt, Hado P. van, Silver, David

Neural Information Processing SystemsFeb-14-2020, 10:41:23 GMT

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves.

algorithm, meta-gradient reinforcement learning, value function, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

meta-gradient reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Meta-Gradient Reinforcement Learning

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Review for NeurIPS paper: Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Review for NeurIPS paper: Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Meta-Gradient Reinforcement Learning

Reviews: Meta-Gradient Reinforcement Learning

Meta-Gradient Reinforcement Learning